13 research outputs found

    Prototyping a Web-Scale Multimedia Retrieval Service Using Spark

    Get PDF
    International audienceThe world has experienced phenomenal growth in data production and storage in recent years, much of which has taken the form of media files. At the same time, computing power has become abundant with multi-core machines, grids, and clouds. Yet it remains a challenge to harness the available power and move toward gracefully searching and retrieving from web-scale media collections. Several researchers have experimented with using automatically distributed computing frameworks, notably Hadoop and Spark, for processing multimedia material, but mostly using small collections on small computing clusters. In this article, we describe a prototype of a (near) web-scale throughput-oriented MM retrieval service using the Spark framework running on the AWS cloud service. We present retrieval results using up to 43 billion SIFT feature vectors from the public YFCC 100M collection, making this the largest high-dimensional feature vector collection reported in the literature. We also present a publicly available demonstration retrieval system, running on our own servers, where the implementation of the Spark pipelines can be observed in practice using standard image benchmarks, and downloaded for research purposes. Finally, we describe a method to evaluate retrieval quality of the ever-growing high-dimensional index of the prototype, without actually indexing a web-scale media collection

    Exquisitor: Breaking the Interaction Barrier for Exploration of 100 Million Images

    Get PDF
    International audienceIn this demonstration, we present Exquisitor, a media explorer capable of learning user preferences in real-time during interactions with the 99.2 million images of YFCC100M. Exquisitor owes its efficiency to innovations in data representation, compression, and indexing. Exquisitor can complete each interaction round, including learning preferences and presenting the most relevant results, in less than 30 ms using only a single CPU core and modest RAM. In short, Exquisitor can bring large-scale interactive learning to standard desktops and laptops, and even high-end mobile devices

    General Game Playing þrautir leystar með upplýstum leitaraðferðum

    No full text
    Project reportOne of the challenges of General Game Playing (GGP) is to effectively solve puzzles. Solving puzzles is more similar to planning algorithms than the search methods used for two- or multi-player games. General problem solving has been a topic addressed by the planning community for years. In this thesis we adapt heuristic search methods for automated planning to use in solving single-agent GGP puzzles. One of the main differences between planning and GGP is the real-time nature of GGP competitions. The backbone of our puzzle solver is a realtime variant of the classical A* search algorithm we call Time-Bounded and Injection-based A* (TBIA*). The TBIA* is a complete algorithm which always maintains a best known path to follow and updates this path with new and better paths as they are discovered. The heuristic TBIA* uses is constructed automatically for each puzzle being solved, and is based on techniques used in the Heuristic Search Planner system. It is composed of two parts: the first is a distance estimate derived from solving a relaxed problem and the second is a penalty for every unachieved sub-goal. The heuristic is inadmissible when the penalty is added but typically more informative. We also present a caching mechanism to enhance the heuristic performance and a self regulating method we call adaptive k that balances cache useage. We show that our method both adds to the flora of GGP puzzles solvable under real-time settings and outperforms existing simulation-based solution methods on a number of puzzles.Eitt af viðfangsefnum alhliða leikjaspilara er að fást við einmenningsleiki eða þrautir. Að leysa slíkar þrautir er mjög ólíkt því að spila gegn andstæðingum og á meiri samleið með reikniritum fyrir áætlunargerð. Í þessari ritgerð er byggt á margra ára rannsóknum á almennum áætlunarreikniritum og þær aðferðir heimfærðar yfir í heim alhliða leikjaspilara. Meginmunurinn á áætlanagerð og alhliða leikjaspilun er að leikjaspilunin er háð tímatakmörkunum þar sem leikmenn fá upphafs- og leikklukku. Kjarninn í lausnaraðferð okkar er rauntíma útfærsla af A* leitaraðferðinni sem við köllum Time-Bounded and Injection-based A*. Stöðumatið sem við notum byggir á hugmyndum frá Heuristic Search Planner áætlunar hugbúnaðinum og er tvíþætt. Annars vegar er vegalengdin í mark áætluð með því að leysa einfaldaða útgáfu af vandamálinu og hins vegar er bætt við refsingu fyrir hvert óuppfyllt lausnarskilyrði. Vegna þess að ein aðgerð getur uppfyllt fleiri en eitt lausnarskilyrði er ekki tryggt að stöðumatið okkar sé lágmarkandi en í mörgum tilfellum er það mun nær raunveruleikanum sem aftur flýtir fyrir leitinni. Þar sem stöðumatið er tímafrekt kynnum við uppflettiaðferð sem flýtir fyrir útreikningi stöðumata. Einnig höfum við sjálfstillandi ávörðunartöku sem við köllum adaptive k sem nýtir sér uppflettingar eftir gæðum þeirra. Við sýnum fram á að fyrrgreindar aðferðir virka vel á fjölda þeirra þrauta sem notaðar hafa verið í alþjóðlegum keppnum og að við höfum bætt við þann fjölda þrauta hægt er að leysa

    Impact of Storage Technology on Cluster-Based High-Dimensional Indexing

    Get PDF
    The scale of multimedia data collections is expanding at a very fast rate. In order to cope with this growth, the high-dimensional indexing methods used for content-based multimedia retrieval must adapt gracefully to secondary storage. Recent progress in storage technology, however, means that algorithm designers must now cope with a spectrum of secondary storage solutions, ranging from traditional magnetic hard drives to state-of-the-art solid state disks. This paper studies the impact of storage technology on a simple, prototypical high-dimensional indexing method for large scale query processing. We show that while the algorithm implementation deeply impacts the performance of the indexing method, the setup of the underlying storage technology is equally important

    Distributed High-Dimensional Index Creation using Hadoop, HDFS and C++

    Get PDF
    International audienceThis paper presents an initial study where the creation of a high-dimensional index is made parallel and distributed by using the Hadoop framework. Early experimental results show substantial performance gains, despite the fact that the Hadoop framework is loosely coupled to the C++ based index creation. Two main lessons can be drawn from this work: (i)~it is key to invest time, energy and manpower to re-implement the code in Java in order to benefit from all the features of Hadoop---although our results are already impressive, even better performance gains will be observed if the index creation is re-implemented in Java; and (ii)~special care must be taken to account for the networking topology to prevent message exchanges from becoming the new bottleneck, when parallelism fixes the CPU bottleneck and HDFS the I/O bottleneck

    Bakgrunnur knattspyrnumanna á Íslandi: Rannsókn á leikmönnnum í Pepsi deild, 1. deild og 2. deild

    No full text
    Rannsóknin kannar bakgrunn knattspyrnumanna í meistaraflokki karla. Þátttakendur í rannsókninni voru 114 leikmenn í efstu þremur deildum á Íslandi. Notast var við megindlegri rannsóknaraðferð og var spurningarlisti lagður fyrir leikmenn. Niðurstöðurnar voru af margvíslegum toga, enda margar breytur skoðaðar. Langflestir leikmannanna voru í A-liði í yngri flokkum og leikmenn í neðri liðum en A-liði virðast síður skila sér upp í meistaraflokk. Leikmenn virðast hafa tekið stór framfaraskref í getu úr 5. flokki í 3.flokk, sérstaklega þeir leikmenn sem nú spila í annarri deild. Þá töldu leikmenn andlega eiginleika vera mikilvægasta þáttinn í því að ná langt sem knattspyrnumenn. Niðurstöður sýna fram á að fæðingardagsáhrif eru til staðar í yngri flokkum, en áhrifin virðast minnka eftir því sem aldur eykst. Stuðningur foreldra er almennt frekar mikill. Ofþjálfun er vandamál í yngri flokkum, en tæplega þriðjungur þátttakenda hafði lent í álagsmeiðslum vegna ofþjálfunar. Leikmenn sem spiluðu oft upp fyrir sig um flokk voru mun líklegri til að lenda í álagsmeiðslum. Einnig gefa niðurstöðurnar til kynna að börn sem byrji að æfa knattspyrnu snemma á ævinni eigi meiri möguleika á að ná lengra í framtíðinni. Þá benda niðurstöður til þess að fjölbreytt íþróttaiðkun sé af hinu góða og að sérhæfing í einni íþróttagrein sé ekki æskileg á yngri stigum

    Data Storage and Management for Big Multimedia

    No full text
    corecore